home *** CD-ROM | disk | FTP | other *** search
- Path: zac001p06.zone.ca!user
- From: michel_saucier@cmq.qc.ca (Michel Saucier)
- Newsgroups: comp.lang.c
- Subject: Re: Speed of random generators
- Date: Sat, 09 Mar 1996 12:56:41 -0400
- Organization: Bell Global Solutions
- Distribution: world
- Message-ID: <michel_saucier-0903961256410001@zac001p06.zone.ca>
- References: <TAKAHASI.96Mar1113825@poisson.ece.cmu.edu>
- NNTP-Posting-Host: zac001p06.zone.ca
- X-Newsreader: Yet Another NewsWatcher 2.2.0b4
-
- In article <TAKAHASI.96Mar1113825@poisson.ece.cmu.edu>,
- takahasi@poisson.ece.cmu.edu (Eduardo S. C. Takahashi) wrote:
-
- > I'm working in a simulation that requires intensive use of pseudo-random
- > sequences. My problem is that profiling the program I found out that the
- > generation of the sequences is responsible for about 50% of the total
- > execution time. Since each run can take about 4 hours, it would be good
- > if I could save some time.
- >
- > Presently, I'm using the function erand48(), and I'm also making some
- > tests with ramdom(). If anyone has a suggestion I would really appreciate!
-
- Here is the fastest pseudo-random number generator this side of chaos.
- It compiles to about 20 machine instructions, all 1-clock type, except for
- [cached] memory references, which take 1 or 2 clocks, depending on scheduling.
- All in all, alea() takes 30 clock cycles or less to generate a good
- quality 32 bits
- pseudo-random number. Don't take my word for it. Test it.
-
- As a comparison, in most congruential generators (the most common type of RNG)
- there is two MULT (25 clocks) and two DIV instructions, each one costing
- about 40 clock cycles,
- for a total of about 140 clocks by value generated.
-
- /* alea(): an halfway decent random number generator */
- static unsigned long AInd=0;
- static unsigned long AMem[64]; /* 32 values are needed, extended to 64 for
- a speedup */
- unsigned long alea(void) /* LFSR-based random generator, with a
- 2*32=64 longwords memory */
- { unsigned long *p=AMem+(AInd++%32); /* 32 LFSR are maintained in
- parallel, bitwise */
- return *p=*(p+32)=*p^*(p+1)^*(p+2)^*(p+4)^*(p+6)^*(p+31);
- /* each value is kept twice in the array, to speed up access */
- }
-
- long ralea(unsigned long range) /* return a pseudorandom value in 1..range */
- { return 1+alea()%range;}
-
- static long LHseed=1; /* please kindly CHANGE ME */
- unsigned long lehmer(void)/* Lehmer algorithm, Schrage's method to avoid
- overflow */
- {long hi=LHseed/127773L,lo=LHseed%127773L;
- long t=lo*16807L - hi*2836L;
- return LHseed=(t>0)?t:t+2147483647L;
- }
-
- void initAMem(void) /* initialize the AMem[0..31] array columnwise */
- { int i,j; unsigned long t;
- LHseed=(unsigned long)clock();
- for(i=0; i<32; i++)
- { t=lehmer()+lehmer(); /* lehmer is only 31 bits */
- for(j=0; j<32; j++)AMem[j]+=(t>>j&1)<<i; /* put bit t{j} in bit
- AMem[j]{i} */
- }
- }
-
-
- You must call the initAMem() function once before using alea(). You can
- eliminate the
- calls to lehmer() and supply any 32-bits non-zero values, provided they
- are all different
- and contain no obvious pattern. 1,2,3,4, .. 32 are horrendous startup values.
-
- The alea() function maintains 32 LFSRs in lockstep. Each bit of the output
- longword is
- the output bit of a LFSR keyed with a different startup value, so each bit
- follow a
- different sequence than the others.
-
- For maximum speed, use alea() in a short loop, to ensure that the AMem[64]
- array stays in
- the on-chip cache. Moreover, avoid casting the alea() result to a float
- for the common practice
- of scaling it into the 0.0 to 1.0 range. That operation will cost far more
- than the whole
- call to alea(). Even a simple modulus "1+alea()%100" is as costly as
- alea() itself.
-
-
- int vec[100000];
- for(int i=0; i<100000; i++) vec[i]=alea()%1024 +1; /* recommended use */
-
- On my slowpoke 25 Mhz 68040 Mac, I can make about half a million calls to
- alea() by second;
- 543261 calls/sec, last benchmark. In the same benchmark, the lehmer()
- function, which is typical
- for a linear congruential RNG, averaged 144313 calls/sec. So alea() is
- three to four times faster
- than the RNG you use. Likely even faster.
-
-
- "JAMAIS un coup de dΘs n'abolira le HASARD,
- fut-il roulΘ dans des circonstances Θternelles." (StΘphane MallarmΘ)
-
- Michel Saucier. michel_saucier@cmq.qc.ca
-